123 research outputs found

    Detection of protein catalytic residues at high precision using local network properties

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identifying the active site of an enzyme is a crucial step in functional studies. While protein sequences and structures can be experimentally characterized, determining which residues build up an active site is not a straightforward process. In the present study a new method for the detection of protein active sites is introduced. This method uses local network descriptors derived from protein three-dimensional structures to determine whether a residue is part of an active site. It thus does not involve any sequence alignment or structure similarity to other proteins. A scoring function is elaborated over a set of more than 220 proteins having different structures and functions, in order to detect protein catalytic sites with a high precision, <it>i.e</it>. with a minimal rate of false positives.</p> <p>Results</p> <p>The scoring function was based on the counts of first-neighbours on side-chain contacts, third-neighbours and residue type. Precision of the detection using this function was 28.1%, which represents a more than three-fold increase compared to combining closeness centrality with residue surface accessibility, a function which was proposed in recent years. The performance of the scoring function was also analysed into detail over a smaller set of eight proteins. For the detection of 'functional' residues, which were involved either directly in catalytic activity or in the binding of substrates, precision reached a value of 72.7% on this second set. These results suggested that our scoring function was effective at detecting not only catalytic residues, but also any residue that is part of the functional site of a protein.</p> <p>Conclusion</p> <p>As having been validated on the majority of known structural families, this method should prove useful for the detection of active sites in any protein with unknown function, and for direct application to the design of site-directed mutagenesis experiments.</p

    Multi-Domain Norm-referenced Encoding Enables Data Efficient Transfer Learning of Facial Expression Recognition

    Full text link
    People can innately recognize human facial expressions in unnatural forms, such as when depicted on the unusual faces drawn in cartoons or when applied to an animal's features. However, current machine learning algorithms struggle with out-of-domain transfer in facial expression recognition (FER). We propose a biologically-inspired mechanism for such transfer learning, which is based on norm-referenced encoding, where patterns are encoded in terms of difference vectors relative to a domain-specific reference vector. By incorporating domain-specific reference frames, we demonstrate high data efficiency in transfer learning across multiple domains. Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes (humans, monkeys, and cartoon avatars) without extensive training. Norm-referenced encoding also allows the intensity of the expression to be read out directly from neural unit activity, similar to face-selective neurons in the brain. Our model achieves a classification accuracy of 92.15\% on the FERG dataset with extreme data efficiency. We train our proposed mechanism with only 12 images, including a single image of each class (facial expression) and one image per domain (avatar). In comparison, the authors of the FERG dataset achieved a classification accuracy of 89.02\% with their FaceExpr model, which was trained on 43,000 images

    Multistate Effects in Calculations of the Electronic Coupling Element for Electron Transfer Using the Generalized Mulliken−Hush Method

    Get PDF
    A simple diagnostic is developed for the purpose of determining when a third state must be considered to calculate the electronic coupling element for a given pair of diabatic states within the context of the generalized Mulliken−Hush approach (Chem. Phys. Lett. 1996, 275, 15−19). The diagnostic is formulated on the basis of Löwdin partitioning theory. In addition, an effective 2-state GMH expression is derived for the coupling as it is modified by the presence of the third state. Results are presented for (i) a model system involving charge transfer from ethylene to methaniminium cation, (ii) a pair of donor−acceptor-substituted acridinium ions, and (iii) (dimethylamino)benzonitrile, and the diagnostic is shown to be a useful indicator of the importance of multistate effects. The effective 2-state GMH expression is also shown to yield excellent agreement with the exact 3-state GMH results in most cases. For cases involving more than three interacting states a similar diagnostic is presented and several approximations to the full n-state GMH result are explored

    PDBWiki

    Get PDF
    *Background:* The success of community projects such as Wikipedia has recently prompted a discussion about the applicability of such tools in the life sciences. However, there is currently no consensus about how best to achieve this goal.&#xd;&#xa;&#xd;&#xa;*Methodology/Principal Findings:* Here we present a community knowledge base for the annotation of biological molecular structures that addresses some of these issues. This Wiki-style database consists of one structured page for each entry in the the Protein Data Bank (PDB) and allows users to attach categorised comments and discussions to the entries. The core data for each entry is shown as a summary and can be used for searching and navigation via categories. A user-editable list of database cross references is automatically included in each page. Like in a database, it is possible to produce tabular reports and &#x27;structure galleries&#x27; based on user defined queries. PDBWiki runs in parallel to the PDB and is automatically synchronised every week.&#xd;&#xa;&#xd;&#xa;*Conclusions/Significance:* &#x22;PDBWiki&#x22;:http://www.pdbwiki.org is a simple but usable system that serves as a bug-tracker, discussion forum and community annotation system for the structures in the PDB. We believe that PDBWiki can serve as a model for better understanding how to capture community knowledge in the biological sciences

    Optimized Null Model for Protein Structure Networks

    Get PDF
    Much attention has recently been given to the statistical significance of topological features observed in biological networks. Here, we consider residue interaction graphs (RIGs) as network representations of protein structures with residues as nodes and inter-residue interactions as edges. Degree-preserving randomized models have been widely used for this purpose in biomolecular networks. However, such a single summary statistic of a network may not be detailed enough to capture the complex topological characteristics of protein structures and their network counterparts. Here, we investigate a variety of topological properties of RIGs to find a well fitting network null model for them. The RIGs are derived from a structurally diverse protein data set at various distance cut-offs and for different groups of interacting atoms. We compare the network structure of RIGs to several random graph models. We show that 3-dimensional geometric random graphs, that model spatial relationships between objects, provide the best fit to RIGs. We investigate the relationship between the strength of the fit and various protein structural features. We show that the fit depends on protein size, structural class, and thermostability, but not on quaternary structure. We apply our model to the identification of significantly over-represented structural building blocks, i.e., network motifs, in protein structure networks. As expected, choosing geometric graphs as a null model results in the most specific identification of motifs. Our geometric random graph model may facilitate further graph-based studies of protein conformation space and have important implications for protein structure comparison and prediction. The choice of a well-fitting null model is crucial for finding structural motifs that play an important role in protein folding, stability and function. To our knowledge, this is the first study that addresses the challenge of finding an optimized null model for RIGs, by comparing various RIG definitions against a series of network models

    Residue contact-count potentials are as effective as residue-residue contact-type potentials for ranking protein decoys

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>For over 30 years potentials of mean force have been used to evaluate the relative energy of protein structures. The most commonly used potentials define the energy of residue-residue interactions and are derived from the empirical analysis of the known protein structures. However, single-body residue 'environment' potentials, although widely used in protein structure analysis, have not been rigorously compared to these classical two-body residue-residue interaction potentials. Here we do not try to combine the two different types of residue interaction potential, but rather to assess their independent contribution to scoring protein structures.</p> <p>Results</p> <p>A data set of nearly three thousand monomers was used to compare pairwise residue-residue 'contact-type' propensities to single-body residue 'contact-count' propensities. Using a large and standard set of protein decoys we performed an in-depth comparison of these two types of residue interaction propensities. The scores derived from the contact-type and contact-count propensities were assessed using two different performance metrics and were compared using 90 different definitions of residue-residue contact. Our findings show that both types of score perform equally well on the task of discriminating between near-native protein decoys. However, in a statistical sense, the contact-count based scores were found to carry more information than the contact-type based scores.</p> <p>Conclusion</p> <p>Our analysis has shown that the performance of either type of score is very similar on a range of different decoys. This similarity suggests a common underlying biophysical principle for both types of residue interaction propensity. However, several features of the contact-count based propensity suggests that it should be used in preference to the contact-type based propensity. Specifically, it has been shown that contact-counts can be predicted from sequence information alone. In addition, the use of a single-body term allows for efficient alignment strategies using dynamic programming, which is useful for fold recognition, for example. These facts, combined with the relative simplicity of the contact-count propensity, suggests that contact-counts should be studied in more detail in the future.</p

    PDBWiki: added value through community annotation of the Protein Data Bank

    Get PDF
    The success of community projects such as Wikipedia has recently prompted a discussion about the applicability of such tools in the life sciences. Currently, there are several such ‘science-wikis’ that aim to collect specialist knowledge from the community into centralized resources. However, there is no consensus about how to achieve this goal. For example, it is not clear how to best integrate data from established, centralized databases with that provided by ‘community annotation’. We created PDBWiki, a scientific wiki for the community annotation of protein structures. The wiki consists of one structured page for each entry in the the Protein Data Bank (PDB) and allows the user to attach categorized comments to the entries. Additionally, each page includes a user editable list of cross-references to external resources. As in a database, it is possible to produce tabular reports and ‘structure galleries’ based on user-defined queries or lists of entries. PDBWiki runs in parallel to the PDB, separating original database content from user annotations. PDBWiki demonstrates how collaboration features can be integrated with primary data from a biological database. It can be used as a system for better understanding how to capture community knowledge in the biological sciences. For users of the PDB, PDBWiki provides a bug-tracker, discussion forum and community annotation system. To date, user participation has been modest, but is increasing. The user editable cross-references section has proven popular, with the number of linked resources more than doubling from 17 originally to 39 today

    CMView: Interactive contact map visualization and analysis

    Get PDF
    Summary: Contact maps are a valuable visualization tool in structural biology. They are a convenient way to display proteins in two dimensions and to quickly identify structural features such as domain architecture, secondary structure and contact clusters. We developed a tool called CMView which integrates rich contact map analysis with 3D visualization using PyMol. Our tool provides functions for contact map calculation from structure, basic editing, visualization in contact map and 3D space and structural comparison with different built-in alignment methods. A unique feature is the interactive refinement of structural alignments based on user selected substructures. Availability: CMView is freely available for Linux, Windows and MacOS. The software and a comprehensive manual can be downloaded from http://www.bioinformatics.org/cmview/. The source code is licensed under the GNU General Public License. Contact: [email protected], [email protected]

    Optic Flow Statistics and Intrinsic Dimensionality

    Get PDF
    Different kinds of visual sub-structures can be distinguished by the intrinsic dimensionality of the local signals. The concept of intrinsic dimensionality has been mostly exercised using discrete formulations. A recent work (Kruger and Felsberg, 2003; Felsberg and Kruger, 2003) introduced a continuous definition and showed that the inherent structure of the intrinsic dimensionality has essentially the form of a triangle. The current study work analyzes the distribution of signals according to the continuous interpretation of intrinsic dimensionality and the relation to orientation and optic flow features of image patches. Among other things, we give a quantitative interpretation of the distribution of signals according to their intrinsic dimensionality that reveals specific patterns associated to established sub-structures in computer vision. Furthermore, we link quantitative and qualitative properties of the distribution of optic-flow error estimates to these patterns
    corecore